home *** CD-ROM | disk | FTP | other *** search
- -- From Wilfred.Hansen@LOAN1.SP.CS.CMU.EDU Fri Dec 18 08:27:40 1992
-
- -- Pseudo-code for a ratings algorithm
-
- -- This algorithm computes p.rating, the rating for each player, p,
- -- assuming that the player's initial rating, p.seed, is within a few
- -- stones of the correct value.
- -- The approach is to measure the likelihood of each player's rating
- -- and the likelihood of the outcome of each game given the difference
- -- in the opponent's ratings. Multiplying all these likelihoods gives
- -- likelihood of the set of game outcomes for the given set of ratings.
- --
- -- Each iteration step adjusts each player's rating by +delta, -delta,
- -- or not at all, depending on which increases the likelihood of the
- -- outcomes for the player's games. Then the likelihood for the entire
- -- collection of ratings and game outcomes is recomputed and the change
- -- by the current delta is accepted if the likelihood increased. The
- -- iteration repeats with diminishing values of delta until the desired
- -- significance is achieved.
-
- -- For general discussion of the statistics, see:
- -- Elo, Arpad E., Ratings of Chessplayers, past and present.
- -- NY: Arco Pub., 1986
- -- Joe, Harry, "Rating systems based on paired comparison models",
- -- Statistics and Probability Letters v. 11, 1991 p. 343-347
- -- David, H. A., "The method of paired comparisons"
-
-
- -- In this algorithm, a difference of one in rating is intended to
- -- indicate a difference of one stone in strength. There is no built-in
- -- normalization, so there is no way to know if a rating of 1.00
- -- corresponds to Shodan; however, relative values should be a
- -- fair approximation of relative strength.
- --
- -- Strictly speaking, the algorithm has no notion of a difference of one
- -- stone; the ratings are based on the players announced ratings. If
- -- players adopt consistent ratings that differ by, say, half a stone,
- -- the computed ratings will be based on that measure.
- --
- -- When entering rating values, dan values are entered as the corresponding
- -- positive number and the kyu value k is entered as 1-k (so 1 kyu is zero,
- -- 2 kyu is -1, and so on).
-
- -- Two special data types, Rating and Likelihood, appear below.
- -- Both are implemented as floating values.
- -- Rating is a player rating as defined just above.
- -- Likelihood is similar to a probability. It is a mapping of a rating
- -- or game result into [0,1] such that higher values correspond
- -- to more likely ratings or results.
-
- -- Here are parameters as used for the AGA rating system:
-
- p.seed: Rating -- Players's initial rating.
- The latest rating at which the player played in a tournament.
- p.sigma: Rating =
- .5 for players rated at least twice before
- .8 for players rated once before
- 1.0 for previously unrated players 10 kyu and above
- 2.0 for others
- px_sigma: Rating = 1.04 -- see description of game_po, below
-
- handicapeqv: Rating -- Rating point equivalents of handicaps
- -- Handicap is Nstones stones on board
- -- and a komi of additional points for white
- -- Nstones is 0 or 2...9
- -- -20 <= komi <= 20
- === if Nstones == 0 then .5 - .1 * komi else Nstones - .1*komi
-
- maxdelta: Rating = 4 -- In successive passes, adjust ratings by
- -- 4. 2. 1. 0.5 0.25 0.125 0.0625
- -- 0.03125 0.015625 0.0078125 0.00390625
-
- EPSILON: Number = 1e-15 -- Very small, positive non-zero value
- BIG: Number = 4 -- number of standard deviations of ratings change
- -- after which to use EPSILON as likelihood
-
- -- information for each game
- --
- Game:::
- handicapeqv: Rating -- see above
- po: Likelihood -- likelihood of the outcome given the
- -- current rating difference of the players
- oldpo: Likelihood -- po from prior iteration
- white: Player -- refers to player info for white player
- black: Player -- ditto for black player
- whitewins: Boolean -- TRUE if W wins
-
- -- information for each player
- --
- Player:::
- rating: Rating -- current rating estimate
- pr: Likelihood -- what is the likelihood of the current rating?
- oldpr: Likelihood -- pr from prior iteration
- seed: Rating -- initial rating
- sigma: Rating -- standard deviation of curve of likelihood of
- -- ratings in the neighborhood of seed.
- direction: {UP, DOWN, NONE} -- direction to adust rating
- -- also: a list of all games the player has played in
-
-
- -- "pr" is the likelihood that a given rating is correct
- -- "po" similarly the likelihood that a game outcome is correct
- -- Multiplying all pr and pr values computes the total likelihood,
- -- "pt", that is, the likelihood of all game outcomes given a
- -- particular set of ratings for the players.
-
-
- -- player_pr(p: Player): Likelihood
- -- Compute the "prior distribution", pr, of the rating for player 'p'.
- -- Assume the "correct" rating is in a sample space normally
- -- distributed about p.seed with standard deviation p.sigma.
- -- Compute z as the corresponding value in a normal
- -- distribution with mean of 0 and standard deviation of 1.
- -- Then compute the likelihood that z is the correct rating
- -- by evaluating the normal(0,1) density function at z.
- --
- player_pr(p: Player): Likelihood ===
- z = ((p.rating - p.seed) / p.sigma)
- return exp(-z*z/2) / sqrt(2*pi) -- normal density function
- -- exp(...)/sqrt(...) is the function for the normal curve,
- -- that is, a probability density function with mean 0 and s.d. 1.
-
-
- -- game_po(g: Game): Likelihood
- -- Compute the likelihood, po, of the outcome of game 'g'.
- -- The likelihood that white wins is the value of the normal
- -- distribution function evaluated at the ratings difference,
- -- as adjusted by the handicap stones and normalized by px_sigma.
- -- With px_sigma of 1.04:
- -- ratings likelihood
- -- difference white wins
- -- 0 .5
- -- 1 .83
- -- 2 .97
- -- That is, if the handicap is two stones too low, white will
- -- win 97 games out of a hundred.
- -- The likelihood of a black win is (1 - likelihood white wins).
- --
- game_po(g: Game): Likelihood ===
- rd = g.white.rating - g.black.rating - g.handicapeqv
- p = .5 + erf((rd/px_sigma) / sqrt(2)) / 2
- if g.whitewins return p else return 1-p
- --
- -- erf(x) = [2/sqrt(pi)] * integral from 0 to x of exp(-t*t) dt
- -- Some Unix systems have erf. In the expression given, it computes
- -- the normal distribution function, that is, the integral of the
- -- normal density function from negative infinity
- -- to rd/px_sigma, the value whose likelihood we want.
-
- -- Ralston, Anthony, A First Course in Numerical Analysis,
- -- McGraw-Hill (New York, 1965), p. 21:
- -- "the normal distribution function corresponding to the
- -- normal density function with zero mean and variance n/12
- -- is given by 0.5 + 0.5 * erf(sqrt(6/n)x)".
- -- In the algorithm, the variance is 1, so n = 12.
-
-
- -- propose_ratings(delta: Rating)
- -- For a ratings change of 'delta', adjust all players
- -- according to their directions and recompute po for each game.
- --
- propose_ratings(delta: Rating) ===
- for each player, p
- case p.direction
- UP: p.rating += delta
- DOWN: p.rating -= delta
- NONE:
- p.oldpr = p.pr
- p.pr = player_pr(p.rating, p.seed)
- for each game, g
- g.oldpo = g.po
- g.po = game_po(g)
-
-
- --revert_ratings(delta: Rating)
- -- For a ratings change of 'delta', revert to prior ratings.
- --
- revert_ratings(delta: Rating) ===
- for each player, p
- case p.direction
- UP: p.rating -= delta
- DOWN: p.rating += delta
- p.pr = p.oldpr
- for each game, g
- g.po = g.oldpo
-
-
- -- compute_pt(): Likelihood
- -- Compute the likelihood of the combination of all current ratings
- -- and the outcome of all games.
- -- In practice, repeated multiplication of probabilities can lead to
- -- floating underflow. It is preferable to use the logarithms of these
- -- values and use addition instead of multiplication. Indeed, the entire
- -- algorithm can be rewritten to utilize logarithms of Likelihood values.
- --
- compute_pt(): Likelihood ===
- pt = 1
- for each player, p
- pt *= p.pr
- for each game, g
- pt *= g.po
- return pt
-
-
- --compute_player_impact(p: Player, delta: Rating): Number
- -- Compute the ratio ptNew/pt, where
- -- pt is total likelihood given the current assignment of ratings
- -- and ptNew is the likelihood resulting from
- -- changing the rating for player 'p' by 'delta'.
- -- If this value is greater than one, the new rating is more accurate.
- -- The tests against BIG are because after four or five sd's,
- -- the estimated rating has zero likelihood. Therefore we give
- -- the likelihood a little slope back toward the seed to
- -- minimize the possibility of a plateau in the search.
- --
- compute_player_impact(p: Player, delta: Rating): Number ===
- r_save = p.rating
- p.rating += delta -- temporarily change rating
-
- before = abs(r_save - p.seed) / p.sigma
- after = abs(p.rating - p.seed) / p.sigma
- if before > BIG and after > BIG
- if after < before
- pfactor = 1 + EPSILON -- slightly better
- else
- pfactor = 1 / (1 + EPSILON) -- slightly worse
- else if after > BIG
- pfactor = EPSILON / p.pr -- off the edge - very unlikely
- else
- pfactor = player_pr(p) / p.pr
- for each game, g, that p has played
- pfactor *= game_po(g) / g.po
-
- p.rating = r_save
- return pfactor
-
-
- -- estimate_ratings(): Likelihood
- -- Estimate ratings simultaneously for all players and games.
- -- Returns pt, the likelihood of the outcome, given the new ratings.
- --
- estimate_ratings(): Likelihood ===
-
- -- set ratings to seed values
- for each player, p
- p.rating = p.seed
- p.direction = NONE
-
- -- calculate initial values of all pt components
- propose_ratings(0) -- compute initial po and pr
- best_pt = compute_pt() -- compute resulting pt
-
- -- search for best new rating values
- delta = max_delta
- while delta >= .002 -- ensure two decimal places of accuracy
- for each player, p
- -- decide whether rating should go up or down
- chng_plus = compute_player_impact(p, delta)
- chng_minus = compute_player_impact(p, -delta)
- if chng_plus < 1 and chng_minus < 1 then
- p.direction = NONE
- else if chng_plus > chng_minus
- p.direction = UP
- else -- chng_minus > chng_plus
- p.direction = DOWN
-
- -- change ratings and repeat with same or smaller delta
- propose_ratings(delta)
- new_pt = compute_pt()
- if new_pt > best_pt + EPSILON
- best_pt = new_pt
- -- do next cycle with same delta
- else if new_pt >= best_pt
- -- pt is no worse, but not much better; decrease delta
- delta /= 2
- else
- -- pt is no better; undo the change & decrease delta
- revert_ratings(delta)
- delta /= 2
-
- return pt
-
-
-